Purpose

The purpose of this document is to show my approach to understanding precipitation in Santa Barbara County by accessing and analyzing precipitation data sets and then applying that analysis to understand its effect on the Cachuma Reservoir.

Precipitation

Gathering Data

Rainfall data was gathered from the Santa Barbara County Website.

This data consisted of 81 separate .xls files, each from a different rainfall gauge in Santa Barbara County, which I put into a single folder.

I then created a function to read and clean these files. A for loop was used to apply this function to all 81 files and put all of the data into a single data frame.

A separate function and for loop was used to extract the location data for each rainfall gauge.

After this process I had two data frames, one with daily rainfall totals for each date based on station id and another with the latitude and longitude coordinates of each station (rainfall gauge).

Wrangling the Data

The next step was to wrangle this data to produce data frames which could be used for analysis.

Plotting the points

I first used ggmap::get_stamenmap() to get a map of Santa Barbara County.

I then created another data frame with yearly rainfall totals averaged over all the years in the data set based on station id.

This allowed me to plot these points onto the map.

Average yearly precipitation in Santa Barbara County

Average yearly precipitation in Santa Barbara County

Kriging

The next step was to spatially interpolate the data to get a better idea of the precipitation in the entire county.

I decided to use the form of spatial interpolation known as Ordinary Kriging.

The first step in this process was to create a variogram, which describes the spatial dependence. This was done using the gstat::variogram(). The function automap::autofitVariogram() was used to choose the model that best fits the data. Next, after defining a target grid, the gstat::krige() function was used to generate the set of predictions.

Spatial interpolation of average yearly precipitation in Santa Barbara County

Spatial interpolation of average yearly precipitation in Santa Barbara County

To get a better understanding of how precipitation has been changing over the years, I looped this process over a subset of years.

Spatially interpolated yearly precipitation in Santa Barbara County

Spatially interpolated yearly precipitation in Santa Barbara County

Cachuma Reservoir

With the precipitation data sorted out, I could now apply it to understanding the effect of precipitation on water level change in the Cachuma Reservoir. The Cachuma Reservoir is heavily relied upon by the city of Santa Barbara, which is entitled to 32.19% of its available water. Understanding how rainfall affects this reservoir is vitally important.

Reservoir data

I found reservoir level data for the Cachuma Reservoir on the County of Santa Barbara Public Works website.

Fairly consistent data was provided going back to the year 2015. Reservoir level were measured at 15 minute intervals in the units of feet.

With this data, I was able to make some quick plots.

Cachuma Reservoir levels and level changes over time

Cachuma Reservoir levels and level changes over time

Statistical Analysis

More Kriging

After matching up the reservoir data with the precipitation data, I wanted to see which stations recorded rainfall levels that correlated most with changes in reservoir water level. To do this, I created a function that assigns an r squared value to a station based on the results of a regression analysis of the effect of monthly total rainfall at that station and the monthly change in reservoir level.

Initially I performed a simple linear regression using the lm() function. Later on in the analysis process, I found that a polynomial regression, done using lm(x ~ poly(y, 2)), was able to fit the data much better. This is why I used a polynomial regression model for this function. The equation for this model is \[\operatorname{monthly\_level\_change} = \alpha + \beta_{1}(\operatorname{month\_precip}) + \beta_{2}(\operatorname{month\_precip^2}) + \epsilon\]

I then looped the function over all of the 81 stations and put the results into a new data frame.

With these r squared values, I thought a good way to visualize this would be with another spatial interpolation. This interpolation would show differences in correlation between rainfall and reservoir water level change across the county.

Correlation between precipitation and reservoir level change based on r squared value attained through linear regression analysis

Correlation between precipitation and reservoir level change based on r squared value attained through linear regression analysis

The station with the highest correlation was 238 with an r squared value of 0.844

Plotting it in this way was a good way to check and make sure that this correlation makes sense. Station 238 does appear to be in an area that would drain into the Cachuma Reservoir.

I chose to use station 238 for the remainder of my analysis.

Here is the summary of the linear regression model for station 238.

## 
## Call:
## lm(formula = monthly_level_change ~ poly(month_precip, 2), data = station_238_monthly)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5733  -0.5385   0.1320   1.1200   9.2592 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              1.5817     0.5054   3.130  0.00307 ** 
## poly(month_precip, 2)1  47.5492     3.5622  13.348  < 2e-16 ***
## poly(month_precip, 2)2  27.2486     3.5544   7.666 1.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.495 on 45 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.8438, Adjusted R-squared:  0.8368 
## F-statistic: 121.5 on 2 and 45 DF,  p-value: < 2.2e-16

I then put together a graph of the data with the linear model, as well as graphs of the residuals.

Relationship between precipitation at station 238 and reservoir water level change

Relationship between precipitation at station 238 and reservoir water level change

Plots

Revisiting the plots I made of the Cachuma Reservoir water levels earlier, I could now confidently add the precipitation data from station 238.

Daily and monthly interaction between precipitation at station 238 and reservoir water level

Daily and monthly interaction between precipitation at station 238 and reservoir water level